m 



CLAIMS 

What is claimed is: 

1. An apparatus, in an integrated circuit (IC) of a data processing system having at least 
one host processor and host memory, comprising: 

a chip interconnect; 

a memory controller for controlling the host memory comprising DRAM memory, the 

memory controller coupled to the chip interconnect; 
a scalar processing unit coupled to the chip interconnect, the scalar processing unit 

being capable of executing instructions to perform scalar data processing; 
a vector processing unit coupled to the chip interconnect, the vector processing unit 

being capable of executing instructions to perform vector data processing; and 
an input and output (I/O) interface coupled to the chip interconnect, the I/O interface 

receiving/transmitting data from/to the scalar and/or vector processing units. 

2. The apparatus of claim 1, further comprising a switch mechanism coupled the chip 
interconnect and coupled to the scalar processing unit and coupled to the vector 
processing unit, the switch mechanism operable to receive multiple media data stream 
from the I/O interface and dispatch the multiple media data stream to the scalar 
processing unit and/or the vector processing unit. 



3. The apparatus of claim 1, further comprising: 

multiple scalar processing units, the multiple scalar processing units being capable of 
executing instructions to perform scalar processing simultaneously; and 
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multiple vector processing units, the multiple vector processing units being capable of 
executing instructions to perform vector processing simultaneously. 

4. The apparatus of claim 3, further comprising multiple scalar processing units of a kind 
and multiple vector processing unit of a kind. 

5. The apparatus of claim 1, further comprising: 

a general purpose register (GPR) file coupled to the scalar processing unit; 
a vector register (VR) file coupled to the vector processing unit; and 
a load and store unit (LSU), the LSU being capable of executing instructions to load 
and store scalar data from and to the GPR, and the LSU being capable of 
executing instructions to load and store vector data from and to the VR. 

6. The apparatus of claim 5, further comprising a memory location coupled to the chip 
interconnect, wherein the LSU loads and stores data from and to the memory location. 

7. The apparatus of claim 6, further comprising a direct memory access (DMA) engine, the 
DMA engine transferring the multiple media data between the memory location and the 
host memory. 

8. The apparatus of claim 5, wherein the LSU is capable of executing instructions to load 
and store various formats of scalar and vector data, wherein the various formats 
comprise 8-bit, 16-bit, and 32-bit formats. 

.9. The apparatus of claim 1, wherein the switch mechanism comprises an instruction unit 
(lUNTT), the lUNTT controlling and dispatching instructions simultaneously. 
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10. The apparatus of claim 9, wherein the instructions comprise very long instruction word 
(VLIW) instructions. 

11. The apparatus of claim 9, wherein the lUNTT further comprises: 

a program counter; 

a branch unit, wherein the program counter and the branch unit determine the location 

to fetch next instructions; 
an instruction cache memory, the instruction cache memory comprising instruction 

cache tag and data memories for buffering instructions transmitted from the 

host memory; and 
at least one memory mapped registers accessible by the host. 

12. The apparatus of claim 1, wherein the scalar processing unit comprises: 

an integer arithmetic and logic unit (lALU), the lALU being capable of executing 

instructions to perform simple scalar integer arithmetic and logical operations; 
and 

an integer shift unit (ISHU), the ISHU being capable of executing instructions to 
perform scalar bit shifting and rotating operations; 

13. The apparatus of claim 12, wherein the scalar processing unit further comprises a 
floating point unit (FPU), the FPU being capable of executing instructions to perform 
high precision scalar data processing. 

14. The apparatus of claim 1, wherein the vector processing unit comprises: 
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a vector permute unit (VPU), the VPU being capable of executing instructions to 

perform vector permute operations; 
a vector simple integer unit (VSIU), the VSIU being capable of executing instructions 

to perform vector simple integer arithmetic and logical operations; 
a vector complex integer unit (VCIU), the VCIU being capable of executing 

instructions to perform vector complex integer arithmetic operations; and 
a vector look-up table unit (VLUT), the VLUT being capable of executing instructions 

to perform at least one vector table look-up. 

15. The apparatus of claim 14, wherein the vector processing unit further comprises a vector 
floating point unit (VFPU), the VFPU being capable of executing instructions to 
perform high precision vector data processing. 

16. The apparatus of claim 14, wherein the VLUT comprises a memory location storing at 
least one look-up table CLUT). 

17. The apparatus of claim 16, wherein data of the LUT are transferred from the host 
memory to the memory location through a direct memory access (DMA) operation. 

18. The apparatus of claim 16, wherein the memory location comprises a static random 
access memory (SRAM). 

19. The apparatus of claim 1, wherein the scalar and vector processing units are capable of 
performing data processing autonomously and asynchronously to the host processor. 
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20. The apparatus of claim 1, wherein the scalar and vector processing units communicate 
with the host processing through an interrupt mechanism. 

21. The apparatus of claim 1, wherein the scalar and vector processing units are accessible 
by the host processor, through a set of memory mapped addresses. 

22. The apparatus of claim 1, wherein the IC may be a co-processor to the host, wherein the 
IC may be a stand-alone processor coupled to a bus of the data processing system, and 
wherein the chipset may be a core logic chip having a host interface coupled to the host 
processor and memory interface coupled to the host memory. 

23. The apparatus of claim 5, further comprises a special purpose register (SPR) file coupled 
to the chip interconnect. 

24. A method, in an integrated circuit (IC) having a chip interconnect, of a data processing 
system having at least one host processor and a host memory, the method comprising: 

receiving data stream from an input/output (I/O) interface coupled to the chip 
interconnect; 

examining the data to determine whether the data require scalar data processing or 

vector data processing; 
performing scalar data processing on the data in the IC, if the data require scalar data 

processing; and 

performing vector data processing on the data in the IC, if the data require vector data 
processing. 

25. The method of claim 24, further comprising: 
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a scalar processing unit coupled to the chip interconnect, the scalar processing unit 

performing scalar data processing on the data; and 
a vector processing unit coupled to the chip interconnect, the vector processing unit 

performing vector data processing on the data. 

26. The method of claim 25, further comprising multiple scalar processing units performing 
scalar data processing simultaneously and multiple vector processing units performing 
vector data processing simultaneously. 

27. The method of claim 25, further comprising: 

dispatching the data to the scalar processing unit if the data require scalar data 
processing; and 

dispatching the data to the vector processing unit if the data require vector data 
processing. 

28. The method of claim 27, wherein the dispatching is performed by a switch mechanism 
coupled to the chip interconnect, the switch mechanism receiving the data from the I/O 
interface. 

29. The method of claim 28, wherein the switch mechanism comprises an instruction unit 
(lUNTT), the lUNTT being capable of decoding the data. 

30. The method of claim 24, further comprising: 

a general purpose register (GPR) file coupled to the scalar processing unit; 

a vector register (VR) file coupled to the vector processing unit; 

a special purpose register (SPR) coupled to the chip interconnect; and 
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a load and store unit the LSU being capable of executing instructions to load 

and store scalar data from and to the GPR, and the LSU being capable of 
executing instructions to load and store vector data from and to the VR. 

31. The method of claim 30, further comprising a memory location coupled to the chip 
interconnect, wherein the LSU loads and stores data from and to the memory location. 

32. The method of claim 31, further comprising transferring the data between the memory 
location and the host memory, through a direct memory access (DMA) operation. 

33. The method of claim 24, wherein the scalar processing unit comprises: 

an integer arithmetic and logic unit (lALU), the lALU being capable of executing 

instructions to perform simple scalar integer arithmetic and logical operations; 

an integer shift unit (ISHU), the ISHU being capable of executing instructions to 
perform scalar bit shifting and rotating operations; and 

a floating point unit (FPU), the FPU being capable of executing instructions to 
perform high precision scalar data processing. 

34. The method of claim 24, wherein the vector processing unit comprises: 

a vector permute unit (VPU), the VPU being capable of executing instructions to 

perform vector permute operations; 
a vector simple integer unit (VSIU), the VSIU being capable of executing instructions 

to perform vector simple integer arithmetic and logical operations; 
a vector complex integer unit (VCIU), the VCIU being capable of executing 

instructions to perform vector complex integer arithmetic operations; 
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a vector look-up table unit (VLUT), the VLUT being capable of executing instructions 

to perform at least one vector table look-up; and 
a vector floating point unit (VFPU), the VFPU being capable of executing instructions 

to perform high precision vector data processing, 

35. The method of claim 34, wherein the VLUT comprises a memory location storing at 
least one look-up table (LUT). 

36. The method of claim 35, further comprising transferring data of the LUT from the host 
memory to the memory location, through a direct memory access (DMA) operation. 

37. The method of claim 24, wherein the scalar data processing and vector data processing 
are performed autonomously and asynchronously to the host processor. 

38. The method of claim 25, wherein the scalar processing unit and the vector processing 
unit communicate with the host processor through an interrupt mechanism. 

39. The method of claim 25, wherein the scalar processing unit and the vector processing 
unit are accessible by the host processing through a set of memory mapped addresses. 

40. An apparatus, in an integrated circuit (IC) having a chip interconnect, of a data 
processing system having at least one host processor and a host memory, the apparatus 
comprising: 

means for receiving data stream from an input/output (I/O) interface coupled to the 



chip interconnect; 
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means for examining the data to determine whether the data require scalar data 

processing or vector data processing; 
means for performing scalar data processing on the data in the IC, if the data require 

scalar data processing; and 
means for performing vector data processing on the data in the IC, if the data require 

vector data processing. 

41 . The apparatus of claim 40, further comprising: 

a scalar processing unit coupled to the chip interconnect, the scalar processing unit 

performing scalar data processing on the data; and 
a vector processing unit coupled to the chip interconnect, the vector processing unit 

performing vector data processing on the data. 

42. The apparatus of claim 41, further comprising multiple scalar processing units 
performing scalar data processing simultaneously and multiple vector processing units 
performing vector data processing simultaneously. 

43. The apparatus of claim 41, further comprising: 

means for dispatching the data to the scalar processing unit if the data require scalar 
data processing; and 

means for dispatching the data to the vector processing unit if the data require vector 
data processing. 

44. The apparatus of claim 43, wherein the dispatching is performed by a switch mechanism 
coupled to the chip interconnect, the switch mechanism receiving the data from the I/O 
interface. 
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45. The apparatus of claim 44, wherein the switch mechanism comprises an instruction unit 
(lUNTT), the lUNTT being capable of decoding the data. 

46. The apparatus of claim 40, further comprising: 

a general purpose register (GPR) file coupled to the scalar processing unit; 

a vector register (VR) file coupled to the vector processing unit; 

a special purpose register (SPR) coupled to the chip interconnect; and 

a load and store unit (LSU), the LSU being capable of executing instructions to load 
and store scalar data from and to the GPR, and the LSU being capable of 
executing instructions to load and store vector data from and to the VR. 

47. The apparatus of claim 46, further comprising a memory location coupled to the chip 
interconnect, wherein the LSU loads and stores data from and to the memory location. 

48. The apparatus of claim 47, further comprising means for transferring the data between 
the memory location and the host memory, through a direct memory access (DMA) 
operation. 

49. The apparatus of claim 40, wherein the scalar processing unit comprises: 

an integer arithmetic and logic unit (lALU), the lALU being capable of executing 

instructions to perform simple scalar integer arithmetic and logical operations; 

an integer shift unit (ISHU), the ISHU being capable of executing instructions to 
perform scalar bit shifting and rotating operations; and 

a floating point unit (FPU), the FPU being capable of executing instructions to 
perform high precision scalar data processing. 
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50. The apparatus of claim 40, wherein the vector processing unit comprises: 

a vector permute unit (VPU), the VPU being capable of executing instructions to 

perform vector permute operations; 
a vector simple integer unit (VSIU), the VSIU being capable of executing instructions 

to perform vector simple integer arithmetic and logical operations; 
a vector complex integer unit (VCIU), the VCIU being capable of executing 

instructions to perform vector complex integer arithmetic operations; 
a vector look-up table unit (VLUT), the VLUT being capable of executing instructions 

to perform at least one vector table look-up; and 
a vector floating point unit (VFPU), the VFPU being capable of executing instructions 

to perform high precision vector data processing. 

51. The apparatus of claim 50, wherein the VLUT comprises a memory location storing at 
least one look-up table (LUT). 

52. The apparatus of claim 51, further comprising means for transferring data of the LUT 
from the host memory to the memory location, through a direct memory access (DMA) 



53. The apparatus of claim 40, wherein the scalar data processing and vector data processing 
are performed autonomously and asynchronously to the host processor. 

54. The apparatus of claim 41, wherein the scalar processing unit and the vector processing 
unit conmiunicate with the host processor through an interrupt mechanism. 



operation. 
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55. The apparatus of claim 41 , wherein the scalar processing unit and the vector processing 
unit are accessible by the host processing through a set of memory mapped addresses. 

56. A machine readable medium having stored thereon executable code which causes a 
machine to perform a method, in an integrated circuit (IC) having a chip interconnect, of 
a data processing system having at least one host processor and a host memory, the 
method comprising: 

receiving data stream from an input/output (I/O) interface coupled to the chip 
interconnect; 

examining the data to determine whether the data require scalar data processing or 

vector data processing; 
performing scalar data processing on the data in the IC, if the data require scalar data 

processing; and 

performing vector data processing on the data in the IC, if the data require vector data 
processing. 

57. The machine readable medium of claim 56, wherein the method further comprises: 

a scalar processing unit coupled to the chip interconnect, the scalar processing unit 

performing scalar data processing on the data; and 
a vector processing unit coupled to the chip interconnect, the vector processing unit 

performing vector data processing on the data. 

58. The machine readable medium of claim 57, wherein the method further comprises 
multiple scalar processing units performing scalar data processing simultaneously and 
multiple vector processing units performing vector data processing simultaneously. 
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59. The machine readable medium of claim 57, wherein the method further comprises: 

dispatching the data to the scalar processing unit if the data require scalar data 
processing; and 

dispatching the data to the vector processing unit if the data require vector data 
processing. 

60. The machine readable medium of claim 59, wherein the dispatching is performed by a 
switch mechanism coupled to the chip interconnect, the switch mechanism receiving the 
data from the I/O interface. 

61. The machine readable medium of claim 60, wherein the switch mechanism comprises an 
instruction unit (lUNIT), the lUNTT being capable of decoding the data. 

62. The machine readable medium of claim 56, wherein the method further comprises: 

a general purpose register (GPR) file coupled to the scalar processing unit; 

a vector register (VR) file coupled to the vector processing unit; 

a special purpose register (SPR) coupled to the chip interconnect; and 

a load and store unit (LSU), the LSU being capable of executing instructions to load 
and store scalar data from and to the GPR, and the LSU being capable of 
executing instructions to load and store vector data from and to the VR. 

63. The machine readable medium of claim 62, wherein the method further comprises a 
memory location coupled to the chip interconnect, wherein the LSU loads and stores 
data from and to the memory location. 



04860.P2691 



142 



Patent Application 



4 



64. The machine readable medium of claim 63, wherein the method further comprises 
transferring the data between the memory location and the host memory, through a direct 
memory access (DMA) operation. 

65. The machine readable medium of claim 56, wherein the scalar processing unit 
comprises: 

an integer arithmetic and logic unit (lALU), the lALU being capable of executing 

instructions to perform simple scalar integer arithmetic and logical operations; 

an integer shift unit (ISHU), the ISHU being capable of executing instructions to 
perform scalar bit shifting and rotating operations; and 

a floating point unit (FPU), the FPU being capable of executing instructions to 
perform high precision scalar data processing. 

66. The machine readable medium of claim 56, wherein the vector processing unit 
comprises: 

a vector permute unit (VPU), the VPU being capable of executing instructions to 

perform vector permute operations; 
a vector simple integer unit (VSIU), the VSIU being capable of executing instructions 

to perform vector simple integer arithmetic and logical operations; 
a vector complex integer unit (VCIU), the VCIU being capable of executing 

instructions to perform vector complex integer arithmetic operations; 
a vector look-up table unit (VLUT), the VLUT being capable of executing instructions 

to perform at least one vector table look-up; and 
a vector floating point unit (VFPU), the VFPU being capable of executing instructions 

to perform high precision vector data processing. 
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67. The machine readable medium of claim 66, wherein the VLUT comprises a memory 
location storing at least one look-up table (LUT). 

68. The machine readable medium of claim 67, wherein the method further comprises 
transferring data of the LUT from the host memory to the memory location, through a 
direct memory access (DMA) operation. 

69. The machine readable medium of claim 56, wherein the scalar data processing and 
vector data processing are performed autonomously and asynchronously to the host 
processor. 

70. The machine readable medium of claim 57, wherein the scalar processing unit and the 
vector processing unit conmiunicate with the host processor through an interrupt 

H= mechanism. 

lU 

5.3 

y 71. The machine readable medium of claim 57, wherein the scalar processing unit and the 
vector processing unit are accessible by the host processing through a set of memory 
mapped addresses. 
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